Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. Based on this prior, we optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and a user study, we show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
translated by 谷歌翻译
Several face de-identification methods have been proposed to preserve users' privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, e.g., their age, gender, pose, and facial expression. Recently, advanced generative adversarial network models, such as StyleGAN, have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing, where the styles or features of the target face and an auxiliary face get mixed to generate a de-identified face that carries the utilities of the target face. We examined this de-identification method with respect to preserving utility and privacy, by implementing several face detection, verification, and identification attacks. Through extensive experiments and also comparing with two state-of-the-art face de-identification methods, we show that StyleGAN preserves the quality and utility of the faces much better than the other approaches and also by choosing the style mixing levels correctly, it can preserve the privacy of the faces much better than other methods.
translated by 谷歌翻译
This paper proposes embedded Gaussian Process Barrier States (GP-BaS), a methodology to safely control unmodeled dynamics of nonlinear system using Bayesian learning. Gaussian Processes (GPs) are used to model the dynamics of the safety-critical system, which is subsequently used in the GP-BaS model. We derive the barrier state dynamics utilizing the GP posterior, which is used to construct a safety embedded Gaussian process dynamical model (GPDM). We show that the safety-critical system can be controlled to remain inside the safe region as long as we can design a controller that renders the BaS-GPDM's trajectories bounded (or asymptotically stable). The proposed approach overcomes various limitations in early attempts at combining GPs with barrier functions due to the abstention of restrictive assumptions such as linearity of the system with respect to control, relative degree of the constraints and number or nature of constraints. This work is implemented on various examples for trajectory optimization and control including optimal stabilization of unstable linear system and safe trajectory optimization of a Dubins vehicle navigating through an obstacle course and on a quadrotor in an obstacle avoidance task using GP differentiable dynamic programming (GP-DDP). The proposed framework is capable of maintaining safe optimization and control of unmodeled dynamics and is purely data driven.
translated by 谷歌翻译
大型语言模型(例如GPT-3(Brown等,2020)可以执行任意任务,而无需在仅使用少数标签示例的提示之后进行微调。可以将任意任务重新构成自然语言提示,并且可以要求语言模型生成完成,并以称为基于及时的学习的范式间接执行该任务。迄今为止,主要针对单向语言模型证明了新兴迅速的学习能力。但是,预先培训的双向语言模型(例如蒙版语言建模)为转移学习提供了更强大的学习表示。这激发了促使双向模型的可能性,但是它们的预训练目标使它们与现有的提示范式不相容。我们提出SAP(顺序自动回旋提示),该技术可以使双向模型提示。利用机器翻译任务作为案例研究,我们提示了带有SAP的双向MT5模型(Xue等,2021),并演示其少量拍摄和零照片的翻译优于GPT-3等单向模型的几个单拍翻译和XGLM(Lin等,2021),尽管MT5的参数减少了约50%。我们进一步表明SAP对问题的回答和摘要有效。我们的结果首次表明基于及时的学习是更广泛的语言模型的新兴属性,而不仅仅是单向模型。
translated by 谷歌翻译
分散的SGD(D-SGD)跨多个计算机(又称{\ em Nodes})分发了繁重的学习任务,将每个节点的工作负载除以系统的大小。但是,少数\ emph {byzantine}(即,行为不当)节点会危及整个学习过程。当系统为\ emph {异步}时,此漏洞将进一步扩大。尽管已经提出了赋予拜占庭式弹性的方法,但这些方法显着影响该过程的效率,甚至否定了权力下放的好处。这自然提出了一个问题:\ emph {可以同时享受拜占庭式的弹性和每个节点的工作量减少?}我们通过提出\ newalgorithm {}来确保拜占庭式弹性而不会失去D-SGD的计算效率来积极回答。本质上,\ newalgorithm {}通过使用\ emph {polyak的动量}减少本地更新中的差异来削弱拜占庭节点的影响。然后,通过通过{\ em签名的Echo广播}和{\ em最近的邻平均}方案建立节点之间的协调,我们有效地耐受拜占庭节点,同时在非拜桑丁节点之间分布开销。为了证明我们的算法的正确性,我们介绍和分析了一个新颖的{\ em lyapunov函数},该函数是由动量使用而产生的{\ em non-markovian模型漂移}。我们还通过对几个图像分类任务进行实验来证明\ newalgorithm {}的效率。
translated by 谷歌翻译
我们提出了一种两阶段的培训方法,用于开发单个NMT模型,以翻译英语和英语的看不见的语言。对于第一阶段,我们将编码器模型初始化以鉴定XLM-R和Roberta的权重,然后对25种语言的平行数据进行多种语言微调。我们发现该模型可以推广到对看不见的语言的零击翻译。在第二阶段,我们利用这种概括能力从单语数据集生成合成的并行数据,然后用连续的反向翻译训练。最终模型扩展到了英语到许多方向,同时保持了多到英语的性能。我们称我们的方法为ecxtra(以英语为中心的跨语言(x)转移)。我们的方法依次利用辅助并行数据和单语言数据,并且在概念上很简单,仅在两个阶段都使用标准的跨熵目标。最终的ECXTRA模型对8种低资源语言的无监督NMT进行了评估,该语言为英语至哈萨克语(22.3> 10.4 bleu)以及其他15个翻译方向的竞争性能而获得了新的最先进。
translated by 谷歌翻译
在不同的成像方式上建立自称的语义对应是一项基础但强大的计算机视觉任务。当前的多模式注册技术最大化手工制作的域间相似性功能,在建模非线性强度关系和变形方面受到限制,并且可能需要重新工程或在新任务,数据集和域配对上进行大量重新设计或表现不佳。这项工作提出了反合,这是多模式变形注册的一种无监督的对比表示学习方法。通过将学习的多尺度局部贴片特征投射到共同学习的域间嵌入空间上,Cortareg获得了对非刚性多模式对齐有用的表示形式。在实验上,与新生儿T1-T2脑MRI登记任务上的一系列基线和消融相比,通过在一系列基准中进行平滑且可逆的变形,实现了准确,稳健的结果,并在广泛的变形正则化强度范围内验证了所有方法。
translated by 谷歌翻译
根据线性随机微分方程进化的扩散过程是连续时间动态决策模型的重要家族。最佳政策对它们进行了充分研究,并确定了漂移矩阵。然而,对于不确定的漂移矩阵的扩散过程的数据驱动的控制知之甚少,因为常规离散时间分析技术不适用。此外,尽管该任务可以被视为涉及探索和剥削权衡取舍的强化学习问题,但确保系统稳定性是设计最佳政策的基本组成部分。我们确定流行的汤普森采样算法可以快速学习最佳动作,仅产生了时间根的遗憾,并在短时间内稳定了系统。据我们所知,这是汤普森在扩散过程控制问题中抽样的第一个结果。我们通过从两个飞机和血糖控制的两个设置的实际参数矩阵的经验模拟来验证理论结果。此外,我们观察到,与最先进的算法相比,汤普森采样显着改善(最坏的)遗憾,这表明汤普森采样以一种更加保护的方式探索。我们的理论分析涉及特定的特定最优歧管,该歧管将漂移参数的局部几何形状与扩散过程的最佳控制。我们希望这项技术具有更广泛的兴趣。
translated by 谷歌翻译
对机器人在现实世界中的准确控制需要一个控制系统,该控制系统能够考虑机器人与环境的动力学相互作用。在高速度下,机器人对这些运动动力学相互作用的运动依赖性变得更加明显,使高速,准确的机器人控制一个具有挑战性的问题。先前的工作表明,学习机器人的逆动力动力学(IKD)可能有助于高速机器人控制。但是,学习的逆运动动力学模型只能应用于有限的控制问题类别,不同的控制问题需要学习新的IKD模型。在这项工作中,我们提出了一种新的公式,用于精确,高速机器人控制,该配方利用了学习的前进运动动力学(FKD)模型和非线性最小二乘优化。从公式的本质上讲,这种方法可以扩展到各种各样的控制问题,而无需重新培训新模型。我们证明了这种方法在高速上准确控制刻度的十分之一机器人车的能力,并显示出比基线相比的结果。
translated by 谷歌翻译
降低策略梯度方法方差的梯度估计器已成为近年来增强学习研究的主要重点之一,因为它们允许加速估算过程。我们提出了一种称为Sharp的方差降低的策略梯度方法,该方法将二阶信息纳入随机梯度下降(SGD)中,并使用动量和时间变化的学习率。 Sharp Algorithm无参数,实现$ \ Epsilon $ - Appro-Appro-Approximate固定点,带有$ O(\ Epsilon^{ - 3})$的轨迹数,同时使用批量的大小为$ O(1)$迭代。与以前的大多数工作不同,我们提出的算法不需要重要的抽样,这可能会损害降低方差的优势。此外,估计错误的差异会以$ o(1/t^{2/3})$的快速速率衰减,其中$ t $是迭代的数量。我们广泛的实验评估表明,拟议算法对各种控制任务的有效性及其对实践中最新状态的优势。
translated by 谷歌翻译